-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NNCF]: Add INT8 weight compression conformance test for Tinyllama-1.1b PyTorch model #2636
Conversation
TODO: Make the test
compress() and _compress_torch() methods were implemented
@alexsu52 Requesting review as per @MaximProshin 's guideline |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #2636 +/- ##
============================================
- Coverage 91.21% 29.95% -61.26%
============================================
Files 494 494
Lines 45775 45775
============================================
- Hits 41753 13713 -28040
- Misses 4022 32062 +28040 see 330 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@alexsu52 I think I fixed the code, could you please approve the workflow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that you have pushed all the changes because I don't see the changes you mentioned in the description.
Please, provide local results of the test run.
Good evening. Sorry, those initial changes are irrelevant. I changed the code a bit because the initial code was not passing the pipeline. |
Will send screenshots in a bit |
Commandpytest tests/post_training/test_quantize_conformance.py::test_weight_compression -s --data=tests/post_training/data/ -k tinyllama_int8_data_free_backend_PT Output |
Your command does not run any test. There are
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tinyllama_int8_data_free_backend_TORCH
test failed with runtime error.
The general comment: Please add a description of the test case you implemented to the PR description.
@alexsu52 Oh, I just used the command provided in the issue, I'll fix it and tag you asap. Thanks for the feedback |
TODO: Maybe make it in a way where I check for INT8 instead of BackendType.TORCH,
Co-authored-by: Aleksander <[email protected]>
@alexsu52 Hi, could you verify the logic I followed? Looks good so far. You can check the updated PR description for a heads-up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the similar test pipeline for PyTorch model in PTQ test:
def test_ptq_quantization( |
You need to reproduce the same pipeline for weight compression test.
Will do |
@alexsu52 Good morning, any updates? Do you need me to help you with anything? |
Hi, I tried to run your PR and got a runtime error. I'll come back with comments after I've done some experiments. |
Oh that's weird. I'll try to run it as well |
@alexsu52 I removed the problematic line of code, I must have returned it by accidentally undo-ing right before the commit |
@alexsu52 did it run? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, based on your implementation, you are trying to implement the following test flow:
- Create FP32 PyTorch HF model
- Export FP32 PyTorch HF model to FP32 OpenVINO model and save to
fp32_model_dir
- Compress FP32 PyTorch HF model to INT8
- Export INT8 PyTorch HF model to INT8 OpenVINO model and save to
output_model_dir
- Calculate the number of int8 and int4 operations by INT8 OpenVINO model.
- Check the number of in8 and int4 operations with references.
- Calculate the similarity metric between FP32 OpenVINO model and INT8 OpenVINO model. The similarity metric is calculated between OpenVINO models for inference optimization on CPU.
- Check the similarity metric with reference.
Is this correct statement?
Thanks for your update. Please pay attention to my comments. |
Good day! Yup, thanks for the review |
Yes, except in the Step 4. I didn't export it as |
Co-authored-by: Alexander Suslov <[email protected]>
Also deleted the following class attributes: MODEL_NAME MODEL_FUNC
Co-authored-by: Alexander Suslov <[email protected]>
Co-authored-by: Alexander Suslov <[email protected]>
Utilization of export_from_model() function from Optimum Co-authored-by: Alexander Suslov <[email protected]>
TORCH Backends only
@alexsu52 Thanks for a detailed review, I implemented all the changes and cleaned up the remaining code, could you review it one last time? |
@alexsu52 FYI, here's the output: |
I have run internal build for validation your changes. It takes some time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the contribution!
build: manual/job/post_training_weight_compression/57
Thanks for guidance, have a great day |
Changes
INT8
compression test suite to themodel_scope
TORCH
backend support inLMWeightCompression
classINT8
compression, dataset, as well as some other parameters (see model_scope) are set toNone
save_pretrained()
forTORCH
modelsTORCH
models (Check the commits for details)Reason for changes
Requested to Benchmark changes via
whowhatbench
in issue #2527Related tickets
ref: 130788
Closes #2527
Tests
INT8
weight compression conformance test forTinyllama-1.1b
PyTorch model